binom (Binomial distribution)#

The binomial distribution models the number of successes in a fixed number of independent yes/no trials.

This notebook uses the same parameterization as scipy.stats.binom:

  • n = number of trials (a non-negative integer)

  • p = success probability per trial (a number in ([0, 1]))

Learning goals#

By the end you should be able to:

  • recognize when a binomial model is appropriate (and when it isn’t)

  • write down the PMF/CDF and key properties

  • derive the mean, variance, and likelihood

  • implement binomial sampling using NumPy only

  • visualize PMF/CDF and validate with Monte Carlo simulation

  • use scipy.stats.binom for computation and estimation workflows

Table of contents#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations

  7. Sampling & Simulation

  8. Visualization

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import math

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)

rng = np.random.default_rng(42)

1) Title & Classification#

Name: binom (Binomial distribution)
Type: Discrete

Support: [ k \in {0, 1, 2, \dots, n} ]

Parameter space: [ n \in {0, 1, 2, \dots},\qquad p \in [0, 1] ]

Interpretation:

  • n is the number of independent trials

  • p is the probability of “success” on each trial

2) Intuition & Motivation#

If you repeat the same yes/no experiment n times, and each trial succeeds with probability p, then

[ X \sim \mathrm{Binomial}(n, p) ]

means:

X counts how many successes you got out of n.

What this distribution models#

  • counting successes in a fixed number of independent trials

  • each trial has the same success probability

Typical real-world use cases#

  • A/B tests: number of conversions out of n visitors

  • quality control: number of defective items in a batch (with independence as an approximation)

  • reliability: number of working components out of n

  • epidemiology: number of positive tests out of n (when tests are conditionally independent)

Relations to other distributions#

  • Bernoulli: if n = 1, then Binomial(1, p) is Bernoulli(p).

  • Poisson approximation: if n is large and p is small with (\lambda = np) fixed, then (\mathrm{Binomial}(n,p) \approx \mathrm{Poisson}(\lambda)).

  • Normal approximation (CLT): for large n with (np(1-p)) not tiny, (X \approx \mathcal{N}(np,, np(1-p))) (often with a continuity correction).

  • Beta–Binomial: if p itself is random with a Beta prior, the marginal count distribution becomes beta–binomial.

  • Negative binomial: counts trials needed to reach a fixed number of successes (“fixed successes” vs “fixed trials”).

3) Formal Definition#

Let (X) be the number of successes in (n) independent Bernoulli trials with success probability (p).

PMF#

For (k \in {0,1,\dots,n}):

[ \mathbb{P}(X = k) = \binom{n}{k} p^k (1-p)^{n-k} ]

and (\mathbb{P}(X=k)=0) for integers (k\notin{0,\dots,n}).

CDF#

For a real number (x), the CDF is

[ F(x) = \mathbb{P}(X \le x) = \sum_{k=0}^{\lfloor x \rfloor} \binom{n}{k} p^k (1-p)^{n-k} ]

with the conventions (F(x)=0) for (x<0) and (F(x)=1) for (x\ge n).

A useful special-function identity (often used for numerical evaluation) is:

[ \mathbb{P}(X \le k) = I_{1-p}(n-k,, k+1) ]

where (I) is the regularized incomplete beta function.

4) Moments & Properties#

Mean and variance#

[ \mathbb{E}[X] = np,\qquad \mathrm{Var}(X) = np(1-p) ]

Skewness and kurtosis#

Let (\sigma^2 = np(1-p)). Then

[ \text{skew}(X) = \frac{1-2p}{\sqrt{np(1-p)}} ]

The excess kurtosis is

[ \text{excess kurt}(X) = \frac{1 - 6p(1-p)}{np(1-p)} ]

(so the full kurtosis is (3 + \text{excess kurt}(X))).

MGF and characteristic function#

With (q = 1-p),

[ M_X(t) = \mathbb{E}[e^{tX}] = (q + p e^t)^n ]

[ \varphi_X(t) = \mathbb{E}[e^{itX}] = (q + p e^{it})^n ]

Entropy#

The (Shannon) entropy is

[ H(X) = -\sum_{k=0}^{n} \mathbb{P}(X=k),\log \mathbb{P}(X=k) ]

There is no simple closed form in general, but it is easy to compute numerically for moderate n.

Other useful properties#

  • Mode: (\lfloor (n+1)p \rfloor), with a tie when ((n+1)p) is an integer.

  • Additivity (same p): if (X\sim\text{Bin}(n_1,p)) and (Y\sim\text{Bin}(n_2,p)) independent, then (X+Y\sim\text{Bin}(n_1+n_2,p)).

  • Complement symmetry: if (X\sim\text{Bin}(n,p)), then (n-X\sim\text{Bin}(n,1-p)).

def _validate_n_p(n, p):
    if isinstance(n, bool) or not isinstance(n, (int, np.integer)):
        raise TypeError("n must be an integer")
    n_int = int(n)
    if n_int < 0:
        raise ValueError("n must be >= 0")

    p_float = float(p)
    if not (0.0 <= p_float <= 1.0):
        raise ValueError("p must be in [0, 1]")
    return n_int, p_float


def binom_logpmf(k, n, p):
    # Log PMF for Bin(n,p) for integer k. Returns -inf outside support.
    n, p = _validate_n_p(n, p)
    k_arr = np.asarray(k)

    out = np.full(k_arr.shape, -np.inf, dtype=float)

    k_int = k_arr.astype(int)
    valid = (k_int == k_arr) & (k_int >= 0) & (k_int <= n)
    if not np.any(valid):
        return out

    if p == 0.0:
        out[valid & (k_int == 0)] = 0.0
        return out
    if p == 1.0:
        out[valid & (k_int == n)] = 0.0
        return out

    kv = k_int[valid]
    log_binom_coeff = (
        math.lgamma(n + 1)
        - np.vectorize(math.lgamma)(kv + 1)
        - np.vectorize(math.lgamma)(n - kv + 1)
    )

    out[valid] = log_binom_coeff + kv * math.log(p) + (n - kv) * math.log1p(-p)
    return out


def binom_pmf(k, n, p):
    return np.exp(binom_logpmf(k, n, p))


def binom_pmf_support(n):
    n, _ = _validate_n_p(n, 0.5)
    return np.arange(n + 1)


def binom_pmf_array(n, p):
    ks = binom_pmf_support(n)
    return ks, binom_pmf(ks, n, p)


def binom_cdf(x, n, p):
    n, p = _validate_n_p(n, p)
    x_arr = np.asarray(x)

    _, pmf = binom_pmf_array(n, p)
    cdf = np.cumsum(pmf)
    cdf = np.clip(cdf, 0.0, 1.0)

    k = np.floor(x_arr).astype(int)
    out = np.zeros_like(x_arr, dtype=float)
    out[x_arr >= n] = 1.0

    inside = (x_arr >= 0) & (x_arr < n)
    if np.any(inside):
        out[inside] = cdf[k[inside]]
    return out


def binom_moments(n, p):
    n, p = _validate_n_p(n, p)
    mean = n * p
    var = n * p * (1.0 - p)

    if var == 0.0:
        skew = float("nan")
        excess_kurt = float("nan")
    else:
        skew = (1.0 - 2.0 * p) / math.sqrt(var)
        excess_kurt = (1.0 - 6.0 * p * (1.0 - p)) / var

    return {
        "mean": mean,
        "var": var,
        "skew": skew,
        "excess_kurt": excess_kurt,
    }


def binom_entropy(n, p, *, base=math.e):
    n, p = _validate_n_p(n, p)
    _, pmf = binom_pmf_array(n, p)

    mask = pmf > 0
    H_nats = -np.sum(pmf[mask] * np.log(pmf[mask]))

    if base == math.e:
        return float(H_nats)
    return float(H_nats / math.log(base))
n, p = 40, 0.25

moments = binom_moments(n, p)
moments
{'mean': 10.0,
 'var': 7.5,
 'skew': 0.18257418583505536,
 'excess_kurt': -0.016666666666666666}
# Monte Carlo check (matches formulas up to sampling error)
samples = rng.binomial(n=n, p=p, size=200_000)

est_mean = samples.mean()
est_var = samples.var(ddof=0)

{
    "formula_mean": moments["mean"],
    "mc_mean": float(est_mean),
    "formula_var": moments["var"],
    "mc_var": float(est_var),
    "entropy_nats": binom_entropy(n, p),
}
{'formula_mean': 10.0,
 'mc_mean': 9.999065,
 'formula_var': 7.5,
 'mc_var': 7.510134125775002,
 'entropy_nats': 2.423340794934518}

5) Parameter Interpretation#

  • n (number of trials) controls the scale and granularity. Larger n tends to make the distribution wider and (often) more bell-shaped.

  • p (success probability) controls the location and skew:

    • if p < 0.5, most mass is near 0 with right-skew

    • if p = 0.5, the distribution is symmetric around n/2

    • if p > 0.5, most mass is near n with left-skew

Two helpful identities: [ \mathbb{E}[X]=np,\qquad \mathrm{Var}(X)=np(1-p) ]

so increasing p shifts the distribution right, and increasing n typically increases both the mean and the variance.

from plotly.subplots import make_subplots

n_fixed = 20
p_values = [0.1, 0.3, 0.5, 0.7, 0.9]

p_fixed = 0.3
n_values = [5, 20, 100]

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=(
        f"PMF vs p (n={n_fixed})",
        f"PMF vs n (p={p_fixed})",
    ),
)

ks = binom_pmf_support(n_fixed)
for p_ in p_values:
    fig.add_trace(
        go.Scatter(
            x=ks,
            y=binom_pmf(ks, n_fixed, p_),
            mode="markers+lines",
            name=f"p={p_}",
        ),
        row=1,
        col=1,
    )

for n_ in n_values:
    ks = binom_pmf_support(n_)
    fig.add_trace(
        go.Scatter(
            x=ks,
            y=binom_pmf(ks, n_, p_fixed),
            mode="markers+lines",
            name=f"n={n_}",
        ),
        row=1,
        col=2,
    )

fig.update_xaxes(title_text="k", row=1, col=1)
fig.update_yaxes(title_text="P(X=k)", row=1, col=1)
fig.update_xaxes(title_text="k", row=1, col=2)
fig.update_yaxes(title_text="P(X=k)", row=1, col=2)
fig.update_layout(height=420, legend_title_text="")

fig.show()

6) Derivations#

A standard way to derive binomial moments is to write (X) as a sum of indicator variables.

Expectation#

Let (I_i) indicate success on trial (i): (I_i\in{0,1}) with (\mathbb{P}(I_i=1)=p).

Then [ X = \sum_{i=1}^{n} I_i ]

and by linearity of expectation: [ \mathbb{E}[X] = \sum_{i=1}^n \mathbb{E}[I_i] = \sum_{i=1}^n p = np ]

Variance#

If trials are independent, (\mathrm{Cov}(I_i,I_j)=0) for (i\ne j). So:

[ \mathrm{Var}(X) = \sum_{i=1}^{n} \mathrm{Var}(I_i) = \sum_{i=1}^n p(1-p) = np(1-p) ]

Likelihood (single observation)#

If you observe (X=k) with known n, the likelihood of p is

[ L(p\mid k) = \binom{n}{k} p^k (1-p)^{n-k} ]

Taking logs (and dropping constants that do not depend on p):

[ \ell(p) = k\log p + (n-k)\log(1-p) ]

Differentiate and set to zero:

[ \ell’(p) = \frac{k}{p} - \frac{n-k}{1-p} = 0 \quad\Longrightarrow\quad \hat p = \frac{k}{n} ]

So the MLE is the observed success fraction.

# Visualize the likelihood for p (single observation)
n = 30
k_obs = 9

p_grid = np.linspace(1e-6, 1 - 1e-6, 600)
log_binom_coeff = math.lgamma(n + 1) - math.lgamma(k_obs + 1) - math.lgamma(n - k_obs + 1)
logL = log_binom_coeff + k_obs * np.log(p_grid) + (n - k_obs) * np.log1p(-p_grid)

p_hat = k_obs / n

fig = go.Figure()
fig.add_trace(go.Scatter(x=p_grid, y=logL, mode="lines", name="log-likelihood"))
fig.add_vline(x=p_hat, line_dash="dash", line_color="black", annotation_text=f"MLE p̂={p_hat:.3f}")
fig.update_layout(title="Binomial log-likelihood for p (n fixed)", xaxis_title="p", yaxis_title="log L(p)")
fig.show()

7) Sampling & Simulation#

Below are two NumPy-only sampling strategies.

A) Sum of Bernoulli trials#

Directly simulate n Bernoulli trials and count successes.

  • Correct by construction

  • Cost: (O(n\cdot \text{size}))

B) Inverse CDF (inverse transform sampling)#

  1. Compute probabilities (\mathbb{P}(X=k)) for (k=0,\dots,n)

  2. Compute the cumulative sum to get the CDF on the support

  3. Draw (U\sim\text{Uniform}(0,1)) and return the smallest (k) with (F(k)\ge U)

  • Cost: (O(n + \text{size}\log n)) with binary search (np.searchsorted)

  • Useful for didactic purposes; production implementations use more specialized algorithms

def sample_binom_bernoulli_sum(n, p, size=1, *, rng: np.random.Generator):
    n, p = _validate_n_p(n, p)
    u = rng.random(size=np.atleast_1d(size).tolist() + [n])
    return (u < p).sum(axis=-1)


def sample_binom_inverse_cdf(n, p, size=1, *, rng: np.random.Generator):
    n, p = _validate_n_p(n, p)

    if p == 0.0:
        return np.zeros(size, dtype=int)
    if p == 1.0:
        return np.full(size, n, dtype=int)

    ks = np.arange(n + 1)

    # Compute PMF stably via log-space, then normalize.
    logp = binom_logpmf(ks, n, p)
    logp = logp - np.max(logp)
    pmf = np.exp(logp)
    pmf = pmf / pmf.sum()

    cdf = np.cumsum(pmf)
    cdf[-1] = 1.0

    u = rng.random(size=size)
    return np.searchsorted(cdf, u, side="left")
n, p = 25, 0.35
size = 50_000

x1 = sample_binom_bernoulli_sum(n, p, size=size, rng=rng)
x2 = sample_binom_inverse_cdf(n, p, size=size, rng=rng)

{
    "bernoulli_sum_mean": float(x1.mean()),
    "inverse_cdf_mean": float(x2.mean()),
    "theory_mean": n * p,
}
{'bernoulli_sum_mean': 8.74378,
 'inverse_cdf_mean': 8.74036,
 'theory_mean': 8.75}

8) Visualization#

We’ll visualize:

  • the PMF on ({0,\dots,n})

  • the CDF (as a step function)

  • Monte Carlo samples vs the exact PMF

n, p = 30, 0.4
ks, pmf = binom_pmf_array(n, p)
cdf = np.cumsum(pmf)

fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(title=f"Binomial PMF (n={n}, p={p})", xaxis_title="k", yaxis_title="P(X=k)")
fig_pmf.show()

fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(title=f"Binomial CDF (n={n}, p={p})", xaxis_title="k", yaxis_title="P(X≤k)")
fig_cdf.show()

mc = sample_binom_inverse_cdf(n, p, size=200_000, rng=rng)
counts = np.bincount(mc, minlength=n + 1)
pmf_hat = counts / counts.sum()

fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=pmf_hat, name="Monte Carlo", opacity=0.6))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="Exact PMF"))
fig_mc.update_layout(
    title=f"Monte Carlo vs exact PMF (n={n}, p={p})",
    xaxis_title="k",
    yaxis_title="Probability",
)
fig_mc.show()

9) SciPy Integration#

SciPy provides a fast, numerically robust implementation via scipy.stats.binom.

  • Use pmf, cdf, sf (survival function), rvs, logpmf, logcdf, …

  • For parameter estimation, binom does not currently expose a .fit() method (SciPy 1.15). For n known, the MLE for p is closed-form; otherwise you can do custom optimization.

from scipy.optimize import minimize_scalar
from scipy.stats import binom

n, p = 12, 0.2
ks = np.arange(n + 1)

pmf_scipy = binom.pmf(ks, n, p)
cdf_scipy = binom.cdf(ks, n, p)
samples_scipy = binom.rvs(n, p, size=10_000, random_state=rng)

{
    "pmf_sum": float(pmf_scipy.sum()),
    "cdf_last": float(cdf_scipy[-1]),
    "sample_mean": float(samples_scipy.mean()),
    "theory_mean": n * p,
}
{'pmf_sum': 1.0,
 'cdf_last': 1.0,
 'sample_mean': 2.4082,
 'theory_mean': 2.4000000000000004}
# "Fit" p with n known (closed-form MLE + SciPy optimization check)
n_true, p_true = 20, 0.35
data = binom.rvs(n_true, p_true, size=2_000, random_state=rng)

p_mle_closed = data.mean() / n_true

def nll(p):
    if not (0.0 < p < 1.0):
        return float("inf")
    return -binom.logpmf(data, n_true, p).sum()

res = minimize_scalar(nll, bounds=(1e-9, 1 - 1e-9), method="bounded")

{
    "p_true": p_true,
    "p_mle_closed": float(p_mle_closed),
    "p_mle_opt": float(res.x),
    "opt_success": bool(res.success),
}
{'p_true': 0.35,
 'p_mle_closed': 0.348475,
 'p_mle_opt': 0.3484751594559644,
 'opt_success': True}

10) Statistical Use Cases#

A) Hypothesis testing (exact binomial test)#

Test whether a success probability equals a reference value (p_0).

B) Bayesian modeling (Beta–Binomial conjugacy)#

If (p\sim\text{Beta}(\alpha,\beta)) and (X\mid p \sim\text{Bin}(n,p)), then the posterior is

[ p\mid X=k \sim \text{Beta}(\alpha + k,, \beta + n - k) ]

This is one of the cleanest examples of conjugacy.

C) Generative modeling (counts conditioned on probabilities)#

Binomial observations show up whenever you generate counts given probabilities, e.g. modeling conversions, click-throughs, and success/failure outcomes at scale.

from scipy.stats import beta, betabinom, binomtest

# A) Hypothesis test
n = 50
k = 18
p0 = 0.25

test = binomtest(k=k, n=n, p=p0, alternative="two-sided")
ci = test.proportion_ci(confidence_level=0.95)

{
    "k": k,
    "n": n,
    "p0": p0,
    "p_value": float(test.pvalue),
    "95%_CI": (float(ci.low), float(ci.high)),
}
{'k': 18,
 'n': 50,
 'p0': 0.25,
 'p_value': 0.10037920682387039,
 '95%_CI': (0.22915706682058734, 0.5080686477145561)}
# B) Bayesian update with Beta prior
alpha0, beta0 = 2.0, 2.0
n, k = 50, 18

alpha_post = alpha0 + k
beta_post = beta0 + (n - k)

posterior_mean = alpha_post / (alpha_post + beta_post)
posterior_ci = beta.ppf([0.025, 0.975], alpha_post, beta_post)

x = np.linspace(0, 1, 400)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=beta.pdf(x, alpha0, beta0), mode="lines", name="prior Beta(2,2)"))
fig.add_trace(
    go.Scatter(
        x=x,
        y=beta.pdf(x, alpha_post, beta_post),
        mode="lines",
        name=f"posterior Beta({alpha_post:.0f},{beta_post:.0f})",
    )
)
fig.add_vline(
    x=posterior_mean,
    line_dash="dash",
    line_color="black",
    annotation_text=f"posterior mean={posterior_mean:.3f}",
)
fig.update_layout(title="Beta prior → Beta posterior for p", xaxis_title="p", yaxis_title="density")
fig.show()

{
    "posterior_mean": float(posterior_mean),
    "posterior_95%_CI": (float(posterior_ci[0]), float(posterior_ci[1])),
}
{'posterior_mean': 0.37037037037037035,
 'posterior_95%_CI': (0.24787219846491212, 0.5019667760094333)}
# Posterior predictive: future successes out of m trials
m = 20
k_new = np.arange(m + 1)

pred_pmf = betabinom.pmf(k_new, m, alpha_post, beta_post)

fig = go.Figure()
fig.add_trace(go.Bar(x=k_new, y=pred_pmf, name="posterior predictive"))
fig.update_layout(
    title=f"Posterior predictive for X_new | data (m={m})",
    xaxis_title="k_new",
    yaxis_title="P(X_new=k_new)",
)
fig.show()
# C) Simple generative example: daily conversions
days = 60
visitors = rng.integers(200, 1200, size=days)

# latent daily conversion rate (captures extra variability beyond pure binomial)
p_day = rng.beta(30, 70, size=days)
conversions = np.array([rng.binomial(n=v, p=p) for v, p in zip(visitors, p_day)])

df = {
    "day": np.arange(days),
    "visitors": visitors,
    "conversions": conversions,
    "rate": conversions / visitors,
}

fig = go.Figure()
fig.add_trace(go.Scatter(x=df["day"], y=df["rate"], mode="markers+lines", name="observed rate"))
fig.update_layout(title="Simulated conversion rates", xaxis_title="day", yaxis_title="conversions / visitors")
fig.show()

{
    "avg_visitors": float(np.mean(visitors)),
    "avg_rate": float(np.mean(df["rate"])),
}
{'avg_visitors': 623.9833333333333, 'avg_rate': 0.3130981451528444}

11) Pitfalls#

Invalid parameters#

  • n must be a non-negative integer.

  • p must lie in ([0,1]).

Numerical issues#

  • For large n, the PMF can underflow in floating point. Prefer log-space computations: scipy.stats.binom.logpmf, logcdf, etc.

  • For tail probabilities, prefer sf (survival function) over 1 - cdf to avoid catastrophic cancellation.

Modeling issues#

  • Independence can be violated (e.g. clustered trials, repeated users). A beta–binomial model can capture over-dispersion.

  • Constant p across trials may be unrealistic (mixture models / hierarchical models often fit better).

  • For sampling without replacement, a hypergeometric distribution is more appropriate.

Approximation gotchas#

  • Poisson approximation requires n large, p small, and (np) moderate.

  • Normal approximation is best when (np(1-p)) is reasonably large (rule-of-thumb: (\ge 10)).

12) Summary#

  • binom is a discrete distribution on ({0,\dots,n}) counting successes in n Bernoulli trials.

  • PMF: (\binom{n}{k}p^k(1-p)^{n-k}), mean (np), variance (np(1-p)).

  • Derivations are clean via the indicator-sum representation (X=\sum I_i).

  • For computation and tail probabilities, prefer SciPy’s binom (especially in log-space).

  • When p varies across trials or trials aren’t independent, consider richer models (e.g. beta–binomial).

References

  • SciPy: scipy.stats.binom and scipy.stats.binomtest

  • Any standard probability text (e.g., Casella & Berger, Statistical Inference)